124 research outputs found
Distributionally Robust Classification on a Data Budget
Real world uses of deep learning require predictable model behavior under
distribution shifts. Models such as CLIP show emergent natural distributional
robustness comparable to humans, but may require hundreds of millions of
training samples. Can we train robust learners in a domain where data is
limited? To rigorously address this question, we introduce JANuS (Joint
Annotations and Names Set), a collection of four new training datasets with
images, labels, and corresponding captions, and perform a series of carefully
controlled investigations of factors contributing to robustness in image
classification, then compare those results to findings derived from a
large-scale meta-analysis. Using this approach, we show that standard ResNet-50
trained with the cross-entropy loss on 2.4 million image samples can attain
comparable robustness to a CLIP ResNet-50 trained on 400 million samples. To
our knowledge, this is the first result showing (near) state-of-the-art
distributional robustness on limited data budgets. Our dataset is available at
\url{https://huggingface.co/datasets/penfever/JANuS_dataset}, and the code used
to reproduce our experiments can be found at
\url{https://github.com/penfever/vlhub/}.Comment: TMLR 2023; openreview link:
https://openreview.net/forum?id=D5Z2E8CNs
When Do Neural Nets Outperform Boosted Trees on Tabular Data?
Tabular data is one of the most commonly used types of data in machine
learning. Despite recent advances in neural nets (NNs) for tabular data, there
is still an active discussion on whether or not NNs generally outperform
gradient-boosted decision trees (GBDTs) on tabular data, with several recent
works arguing either that GBDTs consistently outperform NNs on tabular data, or
vice versa. In this work, we take a step back and question the importance of
this debate. To this end, we conduct the largest tabular data analysis to date,
comparing 19 algorithms across 176 datasets, and we find that the 'NN vs. GBDT'
debate is overemphasized: for a surprisingly high number of datasets, either
the performance difference between GBDTs and NNs is negligible, or light
hyperparameter tuning on a GBDT is more important than choosing between NNs and
GBDTs. A remarkable exception is the recently-proposed prior-data fitted
network, TabPFN: although it is effectively limited to training sets of size
3000, we find that it outperforms all other algorithms on average, even when
randomly sampling 3000 training datapoints. Next, we analyze dozens of
metafeatures to determine what properties of a dataset make NNs or GBDTs
better-suited to perform well. For example, we find that GBDTs are much better
than NNs at handling skewed or heavy-tailed feature distributions and other
forms of dataset irregularities. Our insights act as a guide for practitioners
to determine which techniques may work best on their dataset. Finally, with the
goal of accelerating tabular data research, we release the TabZilla Benchmark
Suite: a collection of the 36 'hardest' of the datasets we study. Our benchmark
suite, codebase, and all raw results are available at
https://github.com/naszilla/tabzilla.Comment: NeurIPS Datasets and Benchmarks Track 202
Comparative economic evaluation of data from the ACRIN national CT colonography trial with three cancer intervention and surveillance modeling network microsimulations
Purpose: To estimate the cost-effectiveness of computed tomographic (CT) colonography for colorectal cancer (CRC) screening in average-risk asymptomatic subjects in the United States aged 50 years. Materials and Methods: Enrollees in the American College of Radiology Imaging Network National CT Colonography Trial provided informed consent, and approval was obtained from the institutional review board at each site. CT colonography performance estimates from the trial were incorporated into three Cancer Intervention and Surveillance Modeling Network CRC microsimulations. Simulated survival and lifetime costs for screening 50-year-old subjects in the United States with CT colonography every 5 or 10 years were compared with those for guideline-concordant screening with colonoscopy, flexible sigmoidoscopy plus either sensitive unrehydrated fecal occult blood testing (FOBT) or fecal immunochemical testing (FIT), and no screening. Perfect and reduced screening adherence scenarios were considered. Incremental cost-effectiveness and net health benefits were estimated from the U.S. health care sector perspective, assuming a 3% discount rate. Results: CT colonography at 5- and 10-year screening intervals was more costly and less effective than FOBT plus flexible sigmoidoscopy in all three models in both 100% and 50% adherence scenarios. Colonoscopy also was more costly and less effective than FOBT plus flexible sigmoidoscopy, except in the CRC-SPIN model assuming 100% adherence (incremental cost-effectiveness ratio: 50 000 per life-year gained. Conclusion: All three models predict CT colonography to be more costly and less effective than non-CT colonographic screening but net beneficial compared with no screening given model assumptions
Guidelines for the use and interpretation of assays for monitoring autophagy (3rd edition)
In 2008 we published the first set of guidelines for standardizing research in autophagy. Since then, research on this topic has continued to accelerate, and many new scientists have entered the field. Our knowledge base and relevant new technologies have also been expanding. Accordingly, it is important to update these guidelines for monitoring autophagy in different organisms. Various reviews have described the range of assays that have been used for this purpose. Nevertheless, there continues to be confusion regarding acceptable methods to measure autophagy, especially in multicellular eukaryotes. For example, a key point that needs to be emphasized is that there is a difference between measurements that monitor the numbers or volume of autophagic elements (e.g., autophagosomes or autolysosomes) at any stage of the autophagic process versus those that measure fl ux through the autophagy pathway (i.e., the complete process including the amount and rate of cargo sequestered and degraded). In particular, a block in macroautophagy that results in autophagosome accumulation must be differentiated from stimuli that increase autophagic activity, defi ned as increased autophagy induction coupled with increased delivery to, and degradation within, lysosomes (inmost higher eukaryotes and some protists such as Dictyostelium ) or the vacuole (in plants and fungi). In other words, it is especially important that investigators new to the fi eld understand that the appearance of more autophagosomes does not necessarily equate with more autophagy. In fact, in many cases, autophagosomes accumulate because of a block in trafficking to lysosomes without a concomitant change in autophagosome biogenesis, whereas an increase in autolysosomes may reflect a reduction in degradative activity. It is worth emphasizing here that lysosomal digestion is a stage of autophagy and evaluating its competence is a crucial part of the evaluation of autophagic flux, or complete autophagy. Here, we present a set of guidelines for the selection and interpretation of methods for use by investigators who aim to examine macroautophagy and related processes, as well as for reviewers who need to provide realistic and reasonable critiques of papers that are focused on these processes. These guidelines are not meant to be a formulaic set of rules, because the appropriate assays depend in part on the question being asked and the system being used. In addition, we emphasize that no individual assay is guaranteed to be the most appropriate one in every situation, and we strongly recommend the use of multiple assays to monitor autophagy. Along these lines, because of the potential for pleiotropic effects due to blocking autophagy through genetic manipulation it is imperative to delete or knock down more than one autophagy-related gene. In addition, some individual Atg proteins, or groups of proteins, are involved in other cellular pathways so not all Atg proteins can be used as a specific marker for an autophagic process. In these guidelines, we consider these various methods of assessing autophagy and what information can, or cannot, be obtained from them. Finally, by discussing the merits and limits of particular autophagy assays, we hope to encourage technical innovation in the field
- …